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Quantum experiments yield random data. We show that the most efficient way to store this 
empirical information by a finite number of bits is by means of the vector of square roots of 
observed relative frequencies. This vector has the unique property that its dispersion becomes 
invariant of the underlying probabilities, and therefore invariant of the physical parameters. This 
also extends to the complex square roots, and it remains true under a unitary transformation. 
This reveals quantum theory as a theory for making predictions which are as accurate as the 
input information, without any statistical loss. Our analysis also suggests that from the point of 
view of information a slightly more accurate theory than quantum theory should be possible. 
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1 Introduction 

There have been several attempts to find an explanation for quantum theory by looking 
at it as a theory of information. For instance, Wheeler's work is based on statistical 
distinguishability [1], von Weizsacker's nr-hypothesis starts with empirical yes-no decisions 
[3], Bohr and Ulfbeck emphasize symmetry [2], Brukner and Zeilinger define an elementary 
system as answering only yes or no to any question [4] (see also the essay [5]), and Hardy 
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introduces five axioms containing no traditional physical concepts [6]. Hardy also cites 
older axiomatic approaches. Grinbaum basis his derivation of the quantum formalism 
explicitly on information [7]. Luo makes use of Fisher information to find Malus' law 
[8]. Mehrafarin derives interference from empirical input information [9]. Recently, Aerts 
exposed quantum theory as a theory of optimal observation and emphasized a similarity to 
the theory of signal analysis [10]. Grangier gives a compact derivation of quantum theory 
based on the discreteness of the empirical information in quantum experiments (e.g. [11] 
and references therein), which is not unlike the work of Lande [12]. But also approaches 
based on structures inherent in probability theory, like the one of Caves et al. (e.g. [13]) 
or of Saunders [14], can be seen as putting primacy on the concept of information, since 
probability is a way of quantiflying information. 

The present paper takes motivation from these works and focusses on a point which 
does not seem to have been touched yet [15]: The raw data of quantum experiments, as 
generic probabilistic experiments, are random numbers. One may then ask on a purely 
informational level, what are meaningful transformations of these random numbers to 
represent the emprirical information in an undistorted way? Hereby we understand a 
representation as undistorted, if the uncertainty volume of the representation vector 1 , 



1 Note that the uncertainty volume of a random vector has nothing to do with the uncertainty relations 
of quantum theory. 
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which is due to the finite empirical information, is constant for a given amount of empirical 
information and thus independent of the representation vector itself [16]. 

We seek such a representation by making use only of the probabilistic paradigm of 
modern physics. We show that there is only one such way of representing observed data 
and that the properties of the representation remain invariant only under linear trans- 
formations. In the limit of infinite empirical information this gives the state vector of 
quantum theory and its linear evolution. But interestingly, for finite information there 
should exist better representations. We comment on this in the discussion. 

The paper is organized as follows: 
Section 2: 

Storing information from a probabilistic yes- no experiment. Encoding the relative fre- 
quency of yes and no into numbers with fixed credibility of the bits (or any other units). 
Section 3: 

Vector respresentation of the empirical information. Easier to handle and more symmetry 
for particular representations. 
Section 4'- 

Extension to probabilistic experiments with more than two outcomes. Generalisation of 
the method of representation is straightforward, because the rule for encoding turns out 
to be the same as for the yes- no experiment. 

Section 5: 

Transformations of the representation vector. Linear transformations are preferable be- 
cause they introduce no unwanted structure in the representation of information. 
Section 6: 
Discussion. 

2 Storing information from a probabilistic yes-no ex- 
periment 

Given a probabilistic experiment with two possible outcomes, '0' and T' (e.g. a Stern- 
Gerlach experiment on a spin 1/2 particle, but for the present purpose tossing a biased 
coin is just as good). The probability p of outcome '1' in a single trial is unknown, but 
known to have a definite value because all experimental conditions are well controlled. We 
do N trials in which '1' is obtained L times (and '0' N — L times). However, there are 
only S bits of storage available, and S is too small to store the observed relative frequency 
v = L/N accurately. How should we encode the experimental result into the S bits, such 
that the probability that these S bits are correct, becomes maximal? 

First, we simply store the relative frequency v itself. That is, we round it to S bits. 
Let us denote this rounded number by [v]s- Now we know that in infinitely many trials v 
would approach p. We can therefore trust [u]s to be correct, if the difference between v 
and p is less than the value of the S + I s * bit. In other words, if 

W-P\<^T V (1) 
The probability that an experiment with N trials will yield such a v shall be denoted by 
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Prob([u]s). It is a function of S, p and N, 

Prob{[v] s )= £ — — — p L (l - p) {N ~ L) (2) 

L,{\L/N-p\<2-(S+i)) 

where the summation is taken over those L for which the condition is true. Fig.l, curve 
(a), shows this probability as a function of p for N = 4000 trials and S = 6 bits. (Exact 
storage of a result would require log 2 (4000)~12 bits.) Note that it is pretty low around 
p = 0.5, where it reaches only 0.68. This is because the fluctuation of the relative frequency 
v is larger for values of p around 0.5 than it is for values close to or close to 1. 

As a second example, we store the experimental result as quantum theory would suggest 



it. We encode the observed probability amplitude. That is, we take rj = y/u = yL/N and 
round it to S bits. The resulting number shall be denoted by [rj\s- What is the probability 
that these S bits are correct? 

Here we must consider that in the limit of infinitely many trials the random number 
i] will approach the limit y/p. We can therefore trust [rj\s to be correct, if the difference 
between rj and y/p is less than the value of the S + I s * bit. The probability that an 
experiment with N trials will yield such an 77 shall be denoted by Prob{[rf\s). It is given 
by 

Prob([ V ] s )= L\(N-LY pL{l ~ P){N ~ L) (3) 

L,(| v /Z/]V- v ^|<2-(S+i)) 

where the summation is taken over those L for which the condition is true. This probability 
is shown in Fig.l, curve (b), again for N = 4000 trials. Note that it is not symmetric 
about p = 0.5. Its lowest value is for p close to 0, where it drops to 0.65. This is lower 
than the lowest probability of 6 correct bits when storing the relative frequency directly. 

As a third example we want to find that way of storing the experimental result, which 
can guarantee the highest minimum value of the probability that its first S bits are correct. 
We must find a smooth and monotonic mapping v — > x, where x is a l so confined to the 
interval [0,1], such that the largest fluctuations of the random variable Xi that can occur 
for any value of p, are smaller than for any other smooth and monotonic function of v in 
[0,1]. * 

We argue as follows: The standard deviation of the relative frequency v is well known 

as 

It is largest at p — 0.5, which corresponds to a large fluctuation of the observed random 
variable v. In order to get a less fluctuating random variable Xi ^ is therefore reasonable to 
compress the region around v m 0.5 to a narrower region, and to expand the regions close 
to and close to 1. The compression-expansion factor should be proportional to \jo v . 
Ideally, this should yield a random variable x{ v )i whose fluctuations are independent of 
p. The ratio of the standard deviations of x an d of v should therefore be 

^ ~ " (5) 



p) 

where c is a constant. In the limit of large iV this can be shown to yield the function 

1 11 1 

X = - arcsin (2v - 1) + - = - arcsin (2L/N - 1) + -. (6) 

7T 2 7T 2 
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Figure 1: Probability of getting the first 6 bits correctly when taking a specific function of 
the experimentally obtained relative frequency L/N and decomposing it into binary form. 

Shown as a function of the probability p. For N = 4000 trials, (a): L/N. (b): ^ L/N. 
(c): tt" 1 arcsin (2L/N - 1) + 1/2. 



Fig.l, curve (c), shows the probability that the first 6 bits of this random variable are 
obtained correctly in an experiment of N = 4000 trials. Note that it is pretty constant at 
about 0.88 over the whole range of p. The smallest values are approached close to and 
close to 1, where it drops to 0.84. 

Clearly, x is the best of the three investigated possibilities of storing the experimental 
result of a probabilistic experiment when fewer storage bits are available than would be 
needed to encode the experimental result precisely. And it seems that it is also the best 
conceivable way, because the probability of getting the first 6 bits correctly tends to be 
constant. Any other function of v might improve this probability in some region of p, but 
necessarily at the expense of lowering it in another region of p. 

Nevertheless, it is important to emphasize that the arcsine- relation of eq.(6), or its 
inverse, 

z/ = sin 2 (^ X ), (7) 

is only really the best function in the limit of infinitely many trials. But real experiments 
are always finite. For these there exists an optimal function, whose form depends on the 
number of trials. It differs from the sinusoidal relation for values of p close to and close 
to 1, where it is less curved. (This will be the topic of a future paper [20].) 

It is also interesting to consider the conceptual status of the limit a random variable 
tends to in infinitely many trials. For the relative frequency v this is the probability p. For 
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the random variable x it is a quantitity which we shall denote by x. The functional relation 
between the two is, in analogy to the corresponding random variables, p = sin 2 (|:r). This 
is reminescent of the quantum theoretical phase. But we should be cautious here. The 
quantity x can be thought to exist for any probabilistic yes- no experiment, classical or 
quantum mechanical. It is simply the limit a particular random variable tends to. However, 
it has a property, which no other limit of a random variable possesses: The accuracy, with 
which it can be known, is knowable before the experiment is done, because it is invariant 
of the probability p. (At least for infinitely many trials, but it is a pretty good statement 
even for finitely many trials, as can be seen in the relative constancy of curve (c) in Fig.l). 



3 Vector representation 



Now the data of the yes- no experiment shall be represented as a two-component real vector. 
This is actually an inefficient method, because the result of a yes-no experiment is only 
one random variable, not two. But quantum theory suggests we should pay a closer look 
at such vectors. Clearly, though, the endpoint of such a vector can only be along a line, 
not within an area. 

In accordance with the previous section, we shall investigate the following three random 
vectors: 

'-(s)-(A) 
*-(s)-(is?-^!:t)- ^ (io) 

Here, V is the vector of relative frequencies of the two possible outcomes, fj is the vector of 
the corresponding square roots (thus the probability amplitude representation of quantum 
theory, except for phases), and x is the vector derived from our 'best' random variable of 
the previous section. Fig. 2 shows the lines of the possible end points of these vectors in 
the first quadrant of the real plane. 
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Figure 2: Graphical representation of the different random vectors u, fj and \. In all three 
graphs the vector corresponds to the experimental result L/N = .25. [vecnu.grf, veceta.grf, 
vecchi.gr f] 
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We note that the end point of V can lie on a straight line of length y/2. The same holds 
for x- And the endpoint of ff can lie on a quarter circle of length |. 

We pose the following question: What is the probability that the experiment of N 
trials yields a vector whose endpoint is no farther from the end point of the true vector 
than the fraction 2~( 5+1 ) of the length of the line on which it can possibly lie? In other 
words, we ask, what is the probability that we know the whereabouts of the true end point 
to an accuracy of S bits after the experiment? 

The answer for the vectors v and x can be given right away. It is the same as that 
for the scalar quantities v and \ °f the previous section, because in each case we are just 
projecting the horizontal axis of the corresponding plot in Fig.2 to the line of possible end 
points. Since these lines are straight, both for V and for x, the statistical properties of 
scalar is, respectively Xi are n °t distorted when going to vector z7, respectively \. This 
means in particular that, in analogy to scalar x, for the vector \ the probability that 
an experiment will yield the whereabouts of its end point correctly to S bits becomes an 
invariant of p as N becomes large. This is evident in Fig.3, where this probability is shown 
as a function of p. (Note that this probability is really the same as that for the scalar 
random variable x m curve (c), Fig.l.) 

The answer for the vector ff must be sought more formally. We want to find the 
probabilty that we can trust the experimentally found ff to S bits. This means we want 
to know the probability for outcomes L/N, given p, for which 

< f 2-^), (11) 



Jl/n \ ( ^p 

y/l - L/N J V VI -V 



where the factor ir/2 is due to the fact that the endpoint of ff is not confined to a curve of 
length 1, but to one of length ir/2. This probability shall be denoted by Prob([ff\ s ). It is 
given by 

Prob([ff] s )= £ L[ S'_ L y ^-P) {N - L \ (12) 

L(selected) ' 

where the summation is to be taken over those selected L which fulfil condition (11). 
This probability is also shown in Fig.3. And note that it is exactly the same as that 
for x- This means that in terms of accuracy of representation the vectors ff and x are 
statistically equivalent representations of the obtained information. The probability that 
the end point of the vectors ff and x will differ from the respective true end point (the one 
approached in the limit of infinite trials) by less than a certain fraction of its possible range 
becomes invariant of p when N is large. Then it depends only on iV and increases when 
N increases. That is why this 'confidence' probability can be specified without knowing 
the experimental data. Knowledge of N is sufficient. But the vector V does not have this 
invariance property. 

A graphical way of understanding the statistical equivalence of ff and x is to look how 
X can be obtained from ff. One must only take the quarter circle on which the endpoint of 
77 lies, straighten it, and squeeze the resulting line from length | homogeneously to length 
y/2. This gives the line on which the endpoint of x hes. 

But ff has one additional feature of invariance, which x does not have. The length of 
vector ff is independent of the data. Interestingly, quantum theory seems to employ just 
this vector (neglecting a complex phase factor), which not only represents the obtained 



6 




Figure 3: Probability of getting the position of the endpoints of the vectors ff (thick 
dashed line) and \ (thin line) correctly to the first 6 bits in their respective domain from 
an experiment of N = 4000 trials. Shown as a function of the probability p. The curves 
coincide with each other. 



information more accurately than virtually all others over the whole range of possible 
results, but which has one more symmetry over equivalent representations. 



4 Extension to a probabilistic experiment with K out- 
comes 

We shall now look at a general probabilistic experiment in which a single trial can give 
one out of K possible outcomes. An example would be a projective measurement on 
a quantum-mechanical fT-level system. (Note that even the most generalized modes of 
measurement are ultimately projective in a higher dimensional Hilbert space than that of 
the original system.) The probabilities for the K different outcomes, p±, ■ ■■,Pk, whose sum 
is 1, are fixed by the preparation and the kind of projection done on the system. But they 
are unknown. 

In view of the specific invariance properties found for the vector ff in the previous 
section, we shall here only investigate the multi dimensional extension of this representa- 
tion vector. And in order to be of general relevance to quantum theory, we add arbitrary 
complex phase factors to the components. Thus ff is now defined as 



\ 



(13) 



where Lj denotes how often the outcome j occurred in the N trials, and the phases ipj are 
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simply added on and cannot be determined in the projective measurement whose result ff 
is to represent. 

For reasons of analytical simplicity we will shift our focus onto the dispersion of ff. 
We have already remarked that when an experimentally obtained random number or 
random vector may have higher or lower probability to be correct to a desired accuracy, 
this is a consequence of differing sensitivity of the numerical decomposition to statistical 
fluctuations. Formally, these fluctuations are described by the dispersion or by its square 
root, the standard deviation. The reason why we found that the probability that the 
observed two component vectors ff and x differ by no more than 2~^ 5+1 ^ of their respective 
range from their respective true vector becomes invariant of p, is that the dispersion for 
both ff and x becomes invariant of p. And this is not only true for the two-component 
vectors, but also for the corresponding vectors of arbitrary dimension, and even when we 
add arbitrary complex phases. We shall show this for the vector ff of general dimension 
K. 

First we must look at its expectation vector E(ff). (Whether the expectation E{.) is a 
vector or a scalar is determined by its argument.) The expectation value of the component 

r)j = J ^e t(pj is defined as 



E(Vj) = E - E TJ-TlP^-PKVr (14) 
l 1= o L K =o L r— L K ] - 

The multiple summation is subject to the constraint Lj = N. It can be greatly simplified 
by realizing that only the summation over Lj takes into account the factor rjj. Therefore, 
all other summations can be done independently. This reduces the calculation of E(rjj) 
to the case as if we were doing an experiment with only two instead of K outcomes. We 
only ask in each trial: Has the outcome j happened, yes or no? The statistics of this 
experiment is governed by the binomial distribution, and so we can write, replacing the 
summation index Lj by I, for simplicity, 

n at] rr 

*(%) = £^^(l-ft)"-^ r '- (15) 

The calculation must be done numerically. We emphasize that E(i]j) is not identical to 
^jp~e l{ Pi for small N, but approaches it for large N. 

Now we turn to the dispersion of ff. It shall be denoted by D 2 (ff). It is defined as the 
expectation value of the quadratic difference between ff and of the expectation of ff: 

D 2 (ff) = E(\ff-E(ff)\ 2 ). (16) 
Note that \ff — E (ff)\ 2 is a real random number given by 

\v-Em 2 = J2h-E( V ,)\ 2 . (17) 

Since the expectation value of a sum is equal to the sum of the expectation values we have 

E (\ff- E (ff)\ 2 ) = E E (\ Vj - E( Vj )\ 2 ) . (18) 

3=1 



8 



So we must only look at the formal calculation of the expectation value of the squared 
difference for one component of the vector ff. We label it D? and it is 



Dj = E (\rfj — E(rjj)\' 



3 

N iV' 



N 



l --2Re (v*E( Vj )) + \E( Vj )\ 2 



N 



(19) 



where rjj = ^l/Ne l,fi: > . The result is obtained numerically and is shown in Fig.4 as a 
function of pj. We note that, when multiplied by N, it approaches |(1 — Pj). And it is 
independent of the phase tpj. With (18) the dispersion of the whole vector ff, when also 
multiplied by N, therefore tends to a constant value, which is ^f^- A deviation exists 
only when one or several of the pj are close to 0, but it disappears when N becomes 
large. We can therefore conclude, that the dispersion of the representation vector ff of the 
result of a probabilistic experiment with K different outcomes in a single trial tends to 
when N becomes large. It therefore tends to become an invariant of the pj. This 
means that the accuracy, with which the true vector is known (i.e. the one which ff 
approaches in the limit of infinitely many trials) only depends on the number of trials. 
In other words, it is sufficient that we know N, in order to be able to specify a small 
hypersphere around the endpoint of the experimentally determined vector ff within which 
the endpoint of the true vector will lie with a certain confidence probability. As this 
confidence probability is independent of the pj, for large AT, it is also the highest achievable 
for any representation. So quantum theory picked a good way of representing empirical 
information, indeed. (Having done this analysis I encountered a very illuminating paper 
by Caves and Fuchs [17], who defined the representation of the state vector by a finite 
number of bits as the quantum information content of the state. In our case this would 
be the total number of bits with which we know ff to a certain confidence level, which are 
(K — 1)S bits, because we know no phases, and the K th component follows from unitarity.) 



5 Transformations of the representation vector 

Here we want to investigate which transformations can be made on ff in order to obtain 
another vector if) which has the same invariance properties as ff and perhaps even additional 
ones, and yet represents the empirical information without any loss. This means, once if) 
is obtained from the relative frequencies it must be possible to get back out 

these relative frequencies when one is given only ip and the arbitrary phases put into ff. 

First, we will look at transformations for K = 2. So we are again considering a yes- 
no experiment like a projective measurement on a quantum mechanical 2- level system. 
Specifically, we consider the following situation. A two-level system has been repeatedly 
prepared in some manner and each time we have done a certain measurement on it and so 
have obtained the vector ff. Now we want to do the whole experiment again, but instead 
of doing the same projective measurement we let the system evolve for some time and then 
do this measurement. Does our knowledge of ff permit us to make any general statement 
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Figure 4: Dispersion D? of component rjj = ^Jhj/N, multiplied by number of trials, as a 
function ofpj. Thin line: N = 100. Thick line: N = 4000. 



of how the representation vector of the result of the second measurement will look like? 
In other words, we are asking, whether we can find any general rule of how the system 
will evolve in time, or to be even more precise, what our representation of our knowledge 
of the system will look like as a function of the parameter time. 

Well, a general rule can only be found if we adopt a general principle. And here it seems 
obvious to assume that our knowledge of the sytem must not deteriorate in time. For, if 
the second measurement revealed that it did deteriorate, we would be forced to postulate 
that something unaccounted for has happened. In practice this means we would be forced 
to acknowledge that we were not aware of all the conditions the system was exposed to 
during the time interval of interest. Therefore, we want to look for a transformation of 
the vector ff into a vector if), such that the dispersion of ip is the same as that of ff, and 
it must have the same invariance property (i.e. it must not depend on the pj, at least for 
large N). 

Does the quantum mechanical rule of linear transformations conform to this principle? 
Here, a transformation of ff is effected by a general rotation 

R = e m (20) 

where a are the Pauli matrices, and f contains the duration, strength and direction of the 
interaction. Writing out R explicitly we have 

Jl — ( cos r + ^ sin t cos ^ sm r s i n 9e~ l<t> 
I — sin r sin 6e^ cos r — % sin r cos 6 

where 9 and specify the direction of r in polar coordinates and the scalar r expresses the 
angle of rotation. Of course, we could have a succession of such rotations with different r. 
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The vector ip now is 



Its dispersion can be calculated in complete analogy to that of ff (eqs.(14)-(19)). The result 
for a specific rotation was obtained numerically and is shown in Fig.5. The left drawing 
shows the dispersions of rji, 7/2 and of the vector ff itself. And the right drawing shows the 
corresponding dispersions for ip. Note that the dispersions for ipi and ip 2 show a different 
behaviour as a function of the probability p\ to obtain T' in a single trial (of the first 

— # 

measurement!). But the dispersion of the whole vector ip tends to become an invariant of 
Pi as the number of trials becomes large, just like that of ff, and it also approaches the 
same value j^. Therefore, the quantum mechanical evolution, at least for the two-level 
system, does ideed conform to the principle we hoped to see fulfilled, namely, that the 
input information is conserved. 




Figure 5: Left side: Dispersions of rji (thin solid line), f] 2 (thin dashed line) and of ff (thick 
horizontal line). Right side: Dispersions of ip\ (thin solid line), ip 2 (thin dashed line) and 

— * 

of ip (thick horizontal line). Both for iV = 4000 trials. Parameters for rotation matrix R 
in degrees: r = 75, 9 = 50, = 110. 



The extension to the K-level system is straightforward. We can write any unitary 
transformation of the complex vector ff with K components (eq.13) as a sequence of trans- 
formations applied to all possible two-dimensional subspaces. Thus we have to define 
matrices 7^-, (i — 1, K — 1 and j — % + 1, ...,K), which are all equivalent to the K- 
dimensional identity matrix, except that the elements la, 1^, Ijt and Ijj are replaced 
by the elements forming the 2-dimensional unitary matrix (eq.21), with suitably chosen 
parameters. There exist K(K — l)/2 such matrices T^-. A method of constructing an 
arbitrary unitary K x K matrix as a product Yii,jTij has been given by Reck et al. [18], 
following Murnaghan [19]. 

It is now sufficient to realize that application of any of the on an input vector ff 
will result in a vector ff, which is equivalent to ff except for the i th and j th components. 
In general, this will change the i th and j th components, and the dispersions of 77- and r/'j 
will not be the same as those of rji, rjj, respectively, as can be seen in Fig.5. But the sum 
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of these dispersions does not change through the transformation, as we will show now. 
Following (17) and (18) the sum of the dispersions of rji and rjj is 



D i +D] = E fyn - E( Vi )\ 2 ) + E - E( Vj )\ 2 ) 
Abbreviating the general 2x2 rotation matrix, eq.(21) as 



a b 

-b* a* 



(23) 



(24) 



where we have a* a + b*b = 1, the transformed components are rfl = arji + brjj and rf'j = 
—b*r]i + a*i]j. The sum of their dispersions is 



Df, + Dj, 



E 



+ E 



rj'j - E{m 



E (\arii + brjj - E(arn + bq^ 2 ) + E (\-b*rn + a% - E(-b* Vi + a*^)| 2 ) 



- E { 



r]i \* + \r ]j \ 2 + \E{r ]i )\ 2 + \E{r ]j )\ 2 



2Re 



Vt E( V *)+ Vj E( V *)}}. (25) 



It is easy to see that this is the same as the sum of the dispersions of the original compo- 
nents, eq.(23). Therefore, the total dispersion of ff will be the same as that of ff. Thus 
the unitary transformation of a K-level system conserves the input information, as was 
the case for the 2-level system, above. This means the following: When we have done 
a projective measurement on iV identically prepared K-level systems and later prepare 
copies in the same way, but let them evolve under well defined conditions before we do 
the projective measurement, our knowledge of the evolution law together with the input 
information obtained in the first measurement enables us to specify the whereabouts of 
the true vector after the evolution with the same accuracy, as we were able to specify the 
true input vector. 



6 Discussion 

We have set out with the conjecture that quantum theory is an optimal theory of encod- 
ing information obtained in the form of clicks, i.e. outcomes of probabilistic observations. 
For this purpose we have first looked for the most efficient way to represent data from 
a multinomial probability distribution by means of real (rational) numbers, because the 
statistics of quantum observations follows the multinomial distribution. We asked how 
the observed relative frequencies should be mapped onto numbers, such that storing these 
numbers by fewer bits than would actually be needed to store the relative frequencies ex- 
actly, ensures the highest probability that these bits are correct (which means, that they 
coincide with those of the results of an ideal experiment in which infinitely many trials 
can be done). We found that storing the vector whose components are the square roots 
of the relative frequencies is the most efficient way, provided the input data are obtained 
from sufficiently many trials, because the statistical fluctuation of the endpoint of this 
vector, and thus the reliability of this information, becomes invariant of the probabilities 
behind the data. Next we investigated complex square roots of relative frequencies by 
adding arbitrary phase factors. And instead of looking at the reliability of their bit-string 
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representation we adopted the formal approach of looking at their dispersion. And here, 
too, we found that when representing the relative frequencies observed in a general proba- 
bilistic experiment by the vector of complex square roots of these relative frequencies, the 
dispersion of this random vector becomes invariant of the probabilities determining these 
relative frequencies. This is a very unique property, because it means that the accuracy 
of this representation of empirical information is independent of physical parameters. It 
is interesting to note that quantum theory employs exactly these vectors (or, to be exact, 
the limits they tend to in infinitely many trials), called probability amplitudes, to describe 
a system. 

We also investigated the properties of the random vector which results from a unitary 
transformation applied to the vector of complex square roots of observed relative frequen- 
cies. It, too, showed the property that its dispersion becomes an invariant both of the 
probabilities determining the input vector, as well as of the parameters fixing the unitary 
transformation. Therefore, it would be an equally efficient way of representing the em- 
pirical information. From the physical point of view this also means, that the quantum 
mechanical evolution, which is described by just such a unitary transformation, preserves 
the information we have about a system. If our original information is such that we can 
specify a small volume in Hilbert space as containing the system, then the evolution will 
neither compress nor expand this volume, although its shape may change. Note that this 
would in general not be true, if we represented the system in any other way than by the 
complex square roots of relative frequencies (probabilities). 

Nevertheless, we also found that for observations with only few trials there should 
exist a better representation, which should deviate notably from the complex-square-roots- 
of-relative-frequencies encoding, when these relative frequencies are due to probabilities 
close to or close to I. This could lead to an apparent deviation from the law of linear 
superposition, e.g., when trying to predict the outcome of an experiment where a particle 
can fire a detector by reaching it over two different paths, or — to extend it to entanglement 
- when there exist two or more indistinguishable possibilities of how several particles can 
fire several detectors in coincidence). We may have measured the probabilities of each 
possibility separately, but with only few trials. And one of these probabilities may be very 
small. Then a simple adding of the complex square roots of the relative frequencies, even 
with suitable phases, may not be the most accurate prediction for the total probability 
amplitude. But one should not see this as a failure of quantum theory. It only tells us 
that quantum theory is a theory working with statistical limits. Its statements refer to 
expectation values obtainable in infinitely many trials of probabilistic experiments. We 
can rightly see it as the backbone of probabilistic science, because we can think that 
in principle any individual observation can be repeated arbitrarily many times. Only, if 
all our probabilistic observations were limited to only few trials, we cannot exclude the 
possibility of a predictive theory which is more accurate than quantum theory. We will 
look at this question elsewhere [20]. 
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